Toolbox model of evolution of metabolic pathways on networks of arbitrary topology

نویسندگان

  • Tin Yau Pang
  • Sergei Maslov
چکیده

Background: In prokaryotic genomes the number of transcriptional regulators is known to quadratically scale with the total number of protein-coding genes. Toolbox model of evolution was recently proposed to explain this scaling for metabolic enzymes and their regulators. According to its rules the metabolic network of an organism evolves by horizontal transfer of pathways from other species. These pathways are part of a larger “universal” network formed by the union of all species-specific networks. It remained to be understood, however, how the topological properties of this universal network influence the scaling law of functional content of genomes in the toolbox model. Methodology/Principal Findings: In this study we answer this question by first analyzing the scaling properties of the toolbox model on arbitrary tree-like universal networks. We prove that critical branching topology, in which the average number of upstream neighbors of a node is equal to one, is both necessary and sufficient for the quadratic scaling. Conversely, the toolbox model on trees with exponentially expanding topology is characterized by the linear scaling with logarithmic corrections. We further generalize the rules of the model to incorporate reactions with multiple substrates/products as well as branched and cyclic metabolic pathways. To achieve its metabolic tasks the new model employs evolutionary optimized pathways with the smallest number of reactions. Numerical simulations of this realistic model on the universal network of all reactions in the KEGG database produced approximately quadratic scaling between the number of pathways and their regulators and the size of the network. To quantify the geometrical structure of individual pathways in this model we investigated the relationship between their number of reactions and byproduct, intermediate, and feedback metabolites. Conclusions/Significance: Our results validate and explain the ubiquitous appearance of the quadratic scaling for a broad spectrum of topologies of underlying universal metabolic networks. They also demonstrate why, in spite of “smallworld” topology, real-life metabolic networks are characterized by a broad distribution of pathway lengths and sizes of metabolic regulons in regulatory networks. Author summary In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. Toolbox model of co-evolution of metabolic and regulatory networks was recently proposed to explain this scaling. In this model prokaryotes acquire new metabolic capabilities by horizontal transfer of metabolic enzymes and/or entire pathways from other organisms. One can conveniently think these new pathways coming from a “universal network” formed by the union of metabolic repertoires of all potential donor organisms. While qualitative toolbox argument does not depend on specific details of the model, the exponent characterizing this scaling can be in principle model-dependent. The question we address in this study is: how the topology of the universal network determines this exponent? We first mathematically derive the quadratic scaling for a broad range of tree-like network topologies. We then propose and study the most realistic version of the model incorporating metabolic reactions with multiple substrates/products and evolutionary optimized pathways with minimal of KEGG reactions sufficient to achieve a given metabolic task. This new model combines the quadratic scaling with interesting geometrical structure of individual pathways involving byproduct, intermediate, and feedback metabolites. Introduction In prokaryotic genomes the number of transcriptional regulators is known to quadratically scale with the total number of protein-coding genes [1]. Toolbox model of co-evolution of metabolic and regulatory networks was recently proposed [2] to explain this scaling in parts of the genome responsible for metabolic functions. In this model prokaryotes acquire new metabolic capabilities by horizontal transfer of entire metabolic pathways from other organisms. One can conveniently think these new pathways coming from some “universal network” formed by the union of metabolic repertoires of all potential donor organisms. The essence of the toolbox argument [2] can be summarized as follows: as the non-regulatory part of genome of an organism (its “toolbox”) grows, it typically needs to acquire fewer and fewer extra new genes (“tools”) in a pathway offering it some new metabolic capability (e.g. the ability to utilize a new nutrient or synthesize a new metabolic product). As a consequence, the number of pathways and by extension the number of their transcriptional regulators grows faster than linearly with the number of non-regulatory genes in the genome. While this qualitative explanation is rather general and therefore does not depend on specific details such as topology of the universal network, the exact value of the exponent α connecting the number of transcription factors (equal to L N the number of pathways or leaves of the network) to the number of metabolites in the metabolic network of an organism M N , as ~ L M N N α , is in general model-dependent. In [2] we mathematically derived the quadratic scaling ( 2 α = ) for the toolbox model with linear pathways on a fully connected graph in which any pair of metabolites can in principle be converted to each other in just one step via a single metabolic reaction. While this situation is obviously unrealistic from biological standpoint, before present study it remained the only mathematically treatable variant of the toolbox model. The universality of the exponent 2 α = was then corroborated [2] by numerical simulations of the toolbox model with linearized pathways on the universal network formed by the union of all metabolic reactions in the KEGG database. While the agreement between the values of the exponent α in these two cases hinted at underlying general principles at work, the detailed understanding of these principles remained elusive. The question we address in this study is: how the topology of the universal network affects the scaling between L N and M N To make our approach even more realistic we propose and numerically study a completely new version of the toolbox model incorporating metabolic reactions with multiple substrates and products as well as branched and cyclic metabolic pathways. Furthermore, unlike random linear pathways on a universal network [2] that can be long and therefore suboptimal from evolutionary standpoint, the new model uses evolutionary optimized pathways with the smallest number of reactions from the KEGG database sufficient to achieve a given metabolic task. ? To answer this question we first consider and solve a more realistic (yet still mathematically treatable) case in which the universal metabolic network is a directed tree of arbitrary topology. While being closer to reality than previously solved [2] case of fully connected network, the toolbox model on a tree-like universal network still retains a number of simplifications such as strictly linear pathways and one substrate  one product reactions. Results Toolbox model on a tree-like universal network: general mathematical description We will first consider the case where the universal metabolic network is a directed tree. For simplicity in this chapter we will consider the case of catabolic pathways, while identical arguments (albeit with opposite direction of all reactions) apply to anabolic pathways. The root of the tree corresponds to the central metabolic core of the organism responsible for biomass production. Peripheral catabolic pathways (branches of the tree) convert external nutrients (leaves) to this core, while the internal nodes of the tree represent intermediate metabolites. Each of metabolites is characterized by its distance 0 max d d < < from the root of the network. The universal network has ( ) ( ) U M N d metabolites at distance d from the root that included ( ) ( ) U L N d leaves (nutrients used in the first step of catabolic pathways) and ( ) ( ) U B N d branching points corresponding to intermediate metabolites generated by more than one metabolic reaction at the next level (see Fig. 1). An organism-specific network (filled circles and thick edges in Fig. 1) at distance d from the root contains ( ) ( ) ( ) U M M N d N d ≤ metabolites composed of ( ) ( ) ( ) U L L N d N d ≤ leaves, ( ) ( ) ( ) U B B N d N d ≤ branching points, and ( ) ( ) ( ) M L B N d N d N d − − metabolites inside linear branches (“one reaction in-one reaction out”) . For simplicity we assume that in the universal network (and thus also in any of its organism-specific subnetworks) no more than two reaction edges can combine at any node (metabolite), while the most general case of an arbitrary distribution of branching numbers can be treated in a very similar fashion. The toolbox model specifies rules by which organism acquires new pathways in the course of its evolution. It consists of the following steps: 1) randomly pick a new nutrient metabolite (a leaf node of the universal network that currently does not belong to the metabolic network of the organism) 2) use the universal network to identify the unique linear pathway which connects the new nutrient to the root of the tree (the metabolic core) and finally 3) add the reactions and intermediate metabolites in the new pathway to the metabolic network of the organism (filled circles and thick edges in Fig. 1). One needs to only add those enzymes that are not yet present in the “genome” of the organism. Graphically it means that the new branch of the universal network is extended until it first intersects the existing metabolic network of the organism. Figure 1. An example of organism-specific metabolic network (filled circles and thick edges) which is always a subset of the universal network (the entire tree). Nodes are divided into layers based on their distance d from the root of the tree. Variables ( ) ( ) U M N d , ( ) ( ) U B N d , ( ) ( ) U L N d for the universal network and ( ) M N d , ( ) B N d , ( ) L N d for species-specific network are illustrated using the layer 3 d = as an example. Consider an organism capable of utilizing ( ) U L L N N ≤ nutrients represented by leaves in the universal network, where ( ) max 1 d L d L N d N = =∑ and ( ) max ( ) ( ) 1 d U U L L d d N N = =∑ . Since we assume that each nutrient utilization pathway is controlled by a dedicated transcriptional regulator sensing its presence or absence in the environment (e.g. LacR for lactose), the corresponding regulatory network would also have L N transcription factors (in the model we ignore transcription factors controlling non-metabolic functions). The non-regulatory part of the genome consists of ( ) max 1 d M M d N d N = =∑ enzymes catalyzing metabolic reactions (strictly speaking M N is the number of metabolites/nodes so that the number of enzymes/edges is 1 M N − ). Quadratic scaling plots [1] shows the number of transcriptional regulators L R N N = vs. the total number of genes in the genome (both regulatory and non-regulatory) 1 G M L N N N = − + . However, since in all organism-specific networks M L N N  , the quadratic scaling between R N and G N is equivalent to 2 L M N N  . We further assume that due to random selection L N nutrients are uniformly distributed among all levels d. Therefore, the number of leaves at a given level is given by ( ) ( ) ( ) U L L N d N d τ = where the fraction ( ) / U L L N N τ = is the same at all levels. On the other hand the fraction ( ) ( ) ( ) / ( ) U M M d N d N d μ = varies from level to level. It usually tends to increase as one gets closer towards the root of the tree reaching 1 for d=0 (the root node itself). To derive the equation for ( ) d μ , one first notices that each of ( 1) M N d + metabolites at level 1 d + is converted to a unique intermediate metabolite at level d . Due to merging of pathways at ( ) B N d branching points the number of unique intermediate metabolites at the level d is actually smaller: ( 1) ( ) M B N d N d + − . To calculate ( ) ( ) ( ) U B B N d N d ≤ one uses the fact that each of the two nodes upstream of a branching point in the universal network is present in the organismspecific network with probability ( ) ( 1) / ( 1) U M M N d N d + + . The probability that they are both present is ( )2 ( ) ( 1) / ( 1) U M M N d N d + + and thus the number of branching points at level d of the organism-specific metabolic network is 2 ( ) ( ) ( 1) ( ) ( ) ( 1) U M B B U M N d N d N d N d   + =   +   . The intermediate metabolites together with new nutrients ( ) ( ) ( ) L U L N d N d τ = entering at the level d add up to the total number of metabolites at level d : 2 ( ) ( ) ( ) ( 1) ( ) ( 1) ( ) ( ) ( 1) U U M M M U B L M N d N d N d N d N d N d τ   + = + − +   +   (1) This equation allows one to iteratively calculate ( ) M N d for all d starting from ( ) max max ( ) ( ) U M L N d N d τ == . We will use this equation to derive the relationship between the number of leaves and the total number of nodes first for a critical branching tree and then for a supercritical one. Toolbox model on a critical tree The Galton–Watson branching process is the simplest process generating random trees, and we will consider its version where a node can have two, one, or zero neighbors (parents) at the previous level with probabilities p2, p1 and p0 k correspondingly. If the average number of parents equals one, then the process is referred to as critical, and if k is greater than one then the process is supercritical. More generally critical and overcritical branching trees can be generated by a variety of random processes such as e.g. directed percolation [3]. While for simplicity we used the Galton-Watson branching process in our derivation below, it can be readily extended to this more general case. The principal geometric difference between supercritical and critical trees is that in the former case the number of nodes in a layer ( ) ( ) ~ U M d N d k exponentially grows with d , while in a critical tree it grows at most algebraically (for the Galton-Watson critical process ( ) ( ) 1/2 ~ U M N d d ). The other difference is that while the critical branching process always stops on its own at a certain finite height max d , a super-critical process will go on forever so that to generate a tree one has to manually terminate it at a predefined layer max d . The most significant feature of a critical tree is that it has much longer branches than a supercritical one of the same size. Indeed, the diameter (the maximal height) of a random critical tree with ( ) U M N nodes is ( max ) ~ U M d N while in a supercritical tree it is much shorter: ( max ) ~ log / log U M d N k . Thus supercritical trees (unlike their critical counterparts) have the small world

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Toolbox Model of Evolution of Metabolic Pathways on Networks of Arbitrary Topology

In prokaryotic genomes the number of transcriptional regulators is known to be proportional to the square of the total number of protein-coding genes. A toolbox model of evolution was recently proposed to explain this empirical scaling for metabolic enzymes and their regulators. According to its rules, the metabolic network of an organism evolves by horizontal transfer of pathways from other sp...

متن کامل

Toolbox model of evolution of prokaryotic metabolic networks and their regulation.

It has been reported that the number of transcription factors encoded in prokaryotic genomes scales approximately quadratically with their total number of genes. We propose a conceptual explanation of this finding and illustrate it using a simple model in which metabolic and regulatory networks of prokaryotes are shaped by horizontal gene transfer of coregulated metabolic pathways. Adapting to ...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

The Highlighted Roles of Metabolic and Cellular Response to Stress Pathways Engaged in Circulating hsa-miR-494-3p and hsa-miR-661 in Alzheimer’s Disease

Background: Among different roles of miRNAs in AD pathogenesis, hsa-miR-494-3p and hsa-miR-661 functions are poorly understood. Methods: To obtain the gene targets, gene networks, gene ontology, and enrichment analysis of the two miRNAs, some web servers were utilized. Furthermore, the expressions of these miRNAs were analyzed by qRT-PCR in 36 blood sera, including 18 Alzheimer’s patients and 1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010